智能论文笔记

BSM loss: A superior way in modeling aleatory uncertainty of fine_grained classification

Shuang Ge , Kehong Yuan , Maokun Han , Desheng Sun , Huabin Zhang , Qiongyu Ye

分类：计算机视觉

2022-06-09

人工智能（AI）辅助方法在风险领域（例如疾病诊断）受到了很多关注。与疾病类型的分类不同，将医学图像归类为良性或恶性肿瘤是一项精细的任务。但是，大多数研究仅着重于提高诊断准确性，而忽略了模型可靠性的评估，从而限制了其临床应用。对于临床实践，校准对过度参数化的模型和固有的噪声极为明显地提出了低数据表格的主要挑战。特别是，我们发现建模与数据相关的不确定性更有利于置信度校准。与测试时间增强（TTA）相比，我们通过混合数据增强策略提出了一个修改后的自举损失（BS损耗）功能，可以更好地校准预测性不确定性并捕获数据分布转换而无需额外推断时间。我们的实验表明，与标准数据增强，深度集合和MC辍学相比，混合（BSM）模型的BS损失（BSM）模型可以将预期校准误差（ECE）减半。在BSM模型下，不确定性与相似性之间的相关性高达-0.4428。此外，BSM模型能够感知室外数据的语义距离，这表明在现实世界中的临床实践中潜力很高。

translated by 谷歌翻译

CORGI-PM: A Chinese Corpus For Gender Bias Probing and Mitigation

Ge Zhang , Yizhi Li , Yaoyao Wu , Linyuan Zhang , Chenghua Lin , Jiayi Geng , Shi Wang , Jie Fu

分类：自然语言处理 | 人工智能 | 机器学习

2023-01-01

As natural language processing (NLP) for gender bias becomes a significant interdisciplinary topic, the prevalent data-driven techniques such as large-scale language models suffer from data inadequacy and biased corpus, especially for languages with insufficient resources such as Chinese. To this end, we propose a Chinese cOrpus foR Gender bIas Probing and Mitigation CORGI-PM, which contains 32.9k sentences with high-quality labels derived by following an annotation scheme specifically developed for gender bias in the Chinese context. Moreover, we address three challenges for automatic textual gender bias mitigation, which requires the models to detect, classify, and mitigate textual gender bias. We also conduct experiments with state-of-the-art language models to provide baselines. To our best knowledge, CORGI-PM is the first sentence-level Chinese corpus for gender bias probing and mitigation.

translated by 谷歌翻译

Second Thoughts are Best: Learning to Re-Align With Human Values from Text Edits

Ruibo Liu , Chenyan Jia , Ge Zhang , Ziyu Zhuang , Tony X Liu , Soroush Vosoughi

分类：自然语言处理 | 人工智能

2023-01-01

We present Second Thought, a new learning paradigm that enables language models (LMs) to re-align with human values. By modeling the chain-of-edits between value-unaligned and value-aligned text, with LM fine-tuning and additional refinement through reinforcement learning, Second Thought not only achieves superior performance in three value alignment benchmark datasets but also shows strong human-value transfer learning ability in few-shot scenarios. The generated editing steps also offer better interpretability and ease for interactive error correction. Extensive human evaluations further confirm its effectiveness.

translated by 谷歌翻译

VertMatch: A Semi-supervised Framework for Vertebral Structure Detection in 3D Ultrasound Volume

Hongye Zeng , kang Zhou , Songhan Ge , Yuchong Gao , Jianhao Zhao , Shenghua Gao , Rui Zheng

分类：计算机视觉

2022-12-28

Three-dimensional (3D) ultrasound imaging technique has been applied for scoliosis assessment, but current assessment method only uses coronal projection image and cannot illustrate the 3D deformity and vertebra rotation. The vertebra detection is essential to reveal 3D spine information, but the detection task is challenging due to complex data and limited annotations. We propose VertMatch, a two-step framework to detect vertebral structures in 3D ultrasound volume by utilizing unlabeled data in semi-supervised manner. The first step is to detect the possible positions of structures on transverse slice globally, and then the local patches are cropped based on detected positions. The second step is to distinguish whether the patches contain real vertebral structures and screen the predicted positions from the first step. VertMatch develops three novel components for semi-supervised learning: for position detection in the first step, (1) anatomical prior is used to screen pseudo labels generated from confidence threshold method; (2) multi-slice consistency is used to utilize more unlabeled data by inputting multiple adjacent slices; (3) for patch identification in the second step, the categories are rebalanced in each batch to solve imbalance problem. Experimental results demonstrate that VertMatch can detect vertebra accurately in ultrasound volume and outperforms state-of-the-art methods. VertMatch is also validated in clinical application on forty ultrasound scans, and it can be a promising approach for 3D assessment of scoliosis.

translated by 谷歌翻译

Graph Federated Learning with Hidden Representation Sharing

Shuang Wu , Mingxuan Zhang , Yuantong Li , Carl Yang , Pan Li

分类：机器学习

2022-12-23

Learning on Graphs (LoG) is widely used in multi-client systems when each client has insufficient local data, and multiple clients have to share their raw data to learn a model of good quality. One scenario is to recommend items to clients with limited historical data and sharing similar preferences with other clients in a social network. On the other hand, due to the increasing demands for the protection of clients' data privacy, Federated Learning (FL) has been widely adopted: FL requires models to be trained in a multi-client system and restricts sharing of raw data among clients. The underlying potential data-sharing conflict between LoG and FL is under-explored and how to benefit from both sides is a promising problem. In this work, we first formulate the Graph Federated Learning (GFL) problem that unifies LoG and FL in multi-client systems and then propose sharing hidden representation instead of the raw data of neighbors to protect data privacy as a solution. To overcome the biased gradient problem in GFL, we provide a gradient estimation method and its convergence analysis under the non-convex objective. In experiments, we evaluate our method in classification tasks on graphs. Our experiment shows a good match between our theory and the practice.

translated by 谷歌翻译

Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation

Jay Zhangjie Wu , Yixiao Ge , Xintao Wang , Weixian Lei , Yuchao Gu , Wynne Hsu , Ying Shan , Xiaohu Qie , Mike Zheng Shou

分类：计算机视觉

2022-12-22

To reproduce the success of text-to-image (T2I) generation, recent works in text-to-video (T2V) generation employ large-scale text-video dataset for fine-tuning. However, such paradigm is computationally expensive. Humans have the amazing ability to learn new visual concepts from just one single exemplar. We hereby study a new T2V generation problem$\unicode{x2014}$One-Shot Video Generation, where only a single text-video pair is presented for training an open-domain T2V generator. Intuitively, we propose to adapt the T2I diffusion model pretrained on massive image data for T2V generation. We make two key observations: 1) T2I models are able to generate images that align well with the verb terms; 2) extending T2I models to generate multiple images concurrently exhibits surprisingly good content consistency. To further learn continuous motion, we propose Tune-A-Video with a tailored Sparse-Causal Attention, which generates videos from text prompts via an efficient one-shot tuning of pretrained T2I diffusion models. Tune-A-Video is capable of producing temporally-coherent videos over various applications such as change of subject or background, attribute editing, style transfer, demonstrating the versatility and effectiveness of our method.

translated by 谷歌翻译

Pay Attention to Your Tone: Introducing a New Dataset for Polite Language Rewrite

Xun Wang , Tao Ge , Allen Mao , Yuki Li , Furu Wei , Si-Qing Chen

分类：自然语言处理

2022-12-20

We introduce \textsc{PoliteRewrite} -- a dataset for polite language rewrite which is a novel sentence rewrite task. Compared with previous text style transfer tasks that can be mostly addressed by slight token- or phrase-level edits, polite language rewrite requires deep understanding and extensive sentence-level edits over an offensive and impolite sentence to deliver the same message euphemistically and politely, which is more challenging -- not only for NLP models but also for human annotators to rewrite with effort. To alleviate the human effort for efficient annotation, we first propose a novel annotation paradigm by a collaboration of human annotators and GPT-3.5 to annotate \textsc{PoliteRewrite}. The released dataset has 10K polite sentence rewrites annotated collaboratively by GPT-3.5 and human, which can be used as gold standard for training, validation and test; and 100K high-quality polite sentence rewrites by GPT-3.5 without human review. We wish this work (The dataset (10K+100K) will be released soon) could contribute to the research on more challenging sentence rewrite, and provoke more thought in future on resource annotation paradigm with the help of the large-scaled pretrained models.

translated by 谷歌翻译

DocAsRef: A Pilot Empirical Study on Repurposing Reference-Based Summary Quality Metrics Reference-Freely

Forrest Sheng Bao , Ruixuan Tu , Ge Luo

分类：人工智能 | 自然语言处理

2022-12-20

Summary quality assessment metrics have two categories: reference-based and reference-free. Reference-based metrics are theoretically more accurate but are limited by the availability and quality of the human-written references, which are both difficulty to ensure. This inspires the development of reference-free metrics, which are independent from human-written references, in the past few years. However, existing reference-free metrics cannot be both zero-shot and accurate. In this paper, we propose a zero-shot but accurate reference-free approach in a sneaky way: feeding documents, based upon which summaries generated, as references into reference-based metrics. Experimental results show that this zero-shot approach can give us the best-performing reference-free metrics on nearly all aspects on several recently-released datasets, even beating reference-free metrics specifically trained for this task sometimes. We further investigate what reference-based metrics can benefit from such repurposing and whether our additional tweaks help.

translated by 谷歌翻译

Wukong-Reader: Multi-modal Pre-training for Fine-grained Visual Document Understanding

Haoli Bai , Zhiguang Liu , Xiaojun Meng , Wentao Li , Shuang Liu , Nian Xie , Rongfu Zheng , Liangwei Wang , Lu Hou , Jiansheng Wei

分类：自然语言处理 | 计算机视觉

2022-12-19

Unsupervised pre-training on millions of digital-born or scanned documents has shown promising advances in visual document understanding~(VDU). While various vision-language pre-training objectives are studied in existing solutions, the document textline, as an intrinsic granularity in VDU, has seldom been explored so far. A document textline usually contains words that are spatially and semantically correlated, which can be easily obtained from OCR engines. In this paper, we propose Wukong-Reader, trained with new pre-training objectives to leverage the structural knowledge nested in document textlines. We introduce textline-region contrastive learning to achieve fine-grained alignment between the visual regions and texts of document textlines. Furthermore, masked region modeling and textline-grid matching are also designed to enhance the visual and layout representations of textlines. Experiments show that our Wukong-Reader has superior performance on various VDU tasks such as information extraction. The fine-grained alignment over textlines also empowers Wukong-Reader with promising localization ability.

translated by 谷歌翻译

ColoristaNet for Photorealistic Video Style Transfer

Xiaowen Qiu , Ruize Xu , Boan He , Yingtao Zhang , Wenqiang Zhang , Weifeng Ge

分类：计算机视觉 | 机器学习

2022-12-19

Photorealistic style transfer aims to transfer the artistic style of an image onto an input image or video while keeping photorealism. In this paper, we think it's the summary statistics matching scheme in existing algorithms that leads to unrealistic stylization. To avoid employing the popular Gram loss, we propose a self-supervised style transfer framework, which contains a style removal part and a style restoration part. The style removal network removes the original image styles, and the style restoration network recovers image styles in a supervised manner. Meanwhile, to address the problems in current feature transformation methods, we propose decoupled instance normalization to decompose feature transformation into style whitening and restylization. It works quite well in ColoristaNet and can transfer image styles efficiently while keeping photorealism. To ensure temporal coherency, we also incorporate optical flow methods and ConvLSTM to embed contextual information. Experiments demonstrates that ColoristaNet can achieve better stylization effects when compared with state-of-the-art algorithms.

translated by 谷歌翻译